Experiments with MRAI Time Stepping Schemes on a Distributed Memory Parallel Environment
نویسنده
چکیده
Implicit time stepping is often difficult to parallelize. The recently proposed Minimal Residual Approximate Implicit (MRAI) schemes [2] are specially designed as a cheaper and parallelizable alternative for implicit time stepping. A several GMRES iterations are performed to solve approximately the implicit scheme of interest, and the step size is adjusted to guarantee stability. A natural way to apply the approach is to modify a given implicit scheme in which one is interested. Here, we present numerical results for two parallel implementations of MRAI schemes. One is based on the simple Euler Backward scheme, and the other is the MRAI-modified multistep ODE solver LSODE. On the Cray T3E and IBM SP2 platforms, the MRAI codes exhibit parallelism of explicit schemes. The model problem under consideration is the 3D spatially discretized heat equation. The speed-up results for the Cray T3E and IBM SP2 are reported. 1 MRAI time stepping Assume that to solve a system of N ODE’s y = f (t, y), we are interested in some implicit time stepping, for instance, the Euler Backward (EB) scheme y − y = τf (tn+1, y ). (1) This nonlinear system in y is usually linearized, and the corresponding Jacobian linear system (I − τJ)(y − y) = τf (tn+1, y), J = ∂f ∂y (tn+1, y ) is solved approximately. In Newton’s process this procedure is repeated. The basic idea in the MRAI time stepping [2] is very simple: at each time step, we solve the Jacobian system approximately with k steps of GMRES [7]. The number of iterations k is fixed and taken small (say 5). MRAI scheme is an approximation for an implicit scheme and therefore it is not unconditionally stable. A step size control for stability is proposed in [2]; it is based on the information delivered by the GMRES process. In MRAI schemes one does not control the residual reduction achieved in GMRES, and the number of iterations k is kept fixed. This makes the overall scheme quite simple. 2 Parallelization of MRAI and numerical experiments It is well known how to parallelize the conjugate gradient type iterative methods (as GMRES) (see e.g. [9, 1, 5]). In our experience it turns out that, on the platforms as the IBM SP2 and Cray T3E, there is no need in modifications proposed in [5, 1]. As a model problem we take a spatially discretized 3D heat equation. (This model problem is used in [8].) The standard 7-point stencil finite difference discretization on the spatial grid 40 × 40 × 40 leads to the system of size N = 64 000. The numerical integration is done for t ∈ [0, 0.7]. In our tests, we use two experimental MRAI codes. The first one is based on the simple Euler Backward scheme (we refer to the code as EB/MRAI), the second is the MRAI-modified stiff ODE solver LSODE (the LSODE/MRAI code). In [2], the performance of the LSODE/MRAI code was tested and compared with the RKC [8] and VODPK [3] codes. For the model problem under consideration, the EB/MRAI code gives the CPU time gain factor 3.2 with respect to the Euler Forward scheme. Both codes use matrix free Jacobian evaluation (see e.g. [6, 4]). Number of GMRES steps was always k = 5. The both tolerance parameters atol and rtol in the LSODE/MRAI code were taken 10. In the EB/MRAI code the step size was chosen automatically on the base of the MRAI stability control [2]. In parallel versions of the code, we used the MPI communication library. In fact, it appears that on both the IBM SP2 and Cray T3E platforms these MRAI codes possess parallelism of explicit schemes, i.e. the speed-up is restricted only by evaluations of f . Simple analysis based on the Amdahl’s law suggests that for the MRAI schemes the speed-up is of the form
منابع مشابه
A General Solution for Implicit Time Stepping Scheme in Rate-dependant Plasticity
In this paper the derivation of the second differentiation of a general yield surface implicit time stepping method along with its consistent elastic-plastic modulus is studied. Moreover, the explicit, trapezoidal implicit and fully implicit time stepping schemes are compared in rate-dependant plasticity. It is shown that implementing fully implicit time stepping scheme in rate-dependant plasti...
متن کاملA Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver
In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid...
متن کامل∆-stepping: a parallelizable shortest path algorithm
The single source shortest path problem for arbitrary directed graphs with n nodes, m edges and nonnegative edge weights can sequentially be solved using O(n · logn+ m) operations. However, no work-efficient parallel algorithm is known that runs in sublinear time for arbitrary graphs. In this paper we present a rather simple algorithm for the single source shortest path problem. Our new algorit...
متن کاملParallel Sparse Matrix by Vector Multiplication using a Shared Virtual Memory Environment
Many iterative schemes in scientiic applications require the multiplication of a sparse matrix by a vector. This kernel has been mainly studied on vector processors and shared-memory parallel computers. In this paper, we address the implementation issues when using a shared virtual memory system on a distributed memory parallel computer. We study in details the impact of loop distribution schem...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کامل